AITopics

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.41)

Neural Information Processing SystemsDec-25-2025, 22:03:30 GMT

On the Value of Target Data in Transfer Learning

We aim to understand the value of additional labeled or unlabeled target data in transfer learning, for any given amount of source data; this is motivated by practical questions around minimizing sampling costs, whereby, target data is usually harder or costlier to acquire than source data, but can yield better accuracy. To this aim, we establish the first minimax-rates in terms of both source and target sample sizes, and show that performance limits are captured by new notions of discrepancy between source and target, which we refer to as transfer exponents. Interestingly, we find that attaining minimax performance is akin to ignoring one of the source or target samples, provided distributional parameters were known a priori. Moreover, we show that practical decisions -- w.r.t.

name change, target data, transfer learning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Steve Hanneke, Samory Kpotufe

On the Value of Target Data in Transfer Learning

Neural Information Processing SystemsAug-20-2025, 00:17:38 GMT

Neural Information Processing Systems http://nips.cc/

classifier, discrepancy, unlabeled data, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.41)

Chevalier, Dominik, Côté, Marie-Pier

From Point to probabilistic gradient boosting for claim frequency and severity prediction

arXiv.org Machine LearningDec-19-2024

Gradient boosting for decision tree algorithms are increasingly used in actuarial applications as they show superior predictive performance over traditional generalized linear models. Many improvements and sophistications to the first gradient boosting machine algorithm exist. We present in a unified notation, and contrast, all the existing point and probabilistic gradient boosting for decision tree algorithms: GBM, XGBoost, DART, LightGBM, CatBoost, EGBM, PGBM, XGBoostLSS, cyclic GBM, and NGBoost. In this comprehensive numerical study, we compare their performance on five publicly available datasets for claim frequency and severity, of various size and comprising different number of (high cardinality) categorical variables. We explain how varying exposure-to-risk can be handled with boosting in frequency models. We compare the algorithms on the basis of computational efficiency, predictive performance, and model adequacy. LightGBM and XGBoostLSS win in terms of computational efficiency. The fully interpretable EGBM achieves competitive predictive performance compared to the black box algorithms considered. We find that there is no trade-off between model adequacy and predictive accuracy: both are achievable simultaneously.

algorithm, artificial intelligence, machine learning, (16 more...)

2412.14916

Country:

North America (0.46)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Insurance (1.00)
Transportation (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Neural Information Processing SystemsOct-10-2024, 19:30:40 GMT

On the Value of Target Data in Transfer Learning

distributional parameter, target data, transfer learning, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.66)

Byambadalai, Undral, Oka, Tatsushi, Yasui, Shota

Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction

arXiv.org Machine LearningJul-22-2024

We propose a novel regression adjustment method designed for estimating distributional treatment effect parameters in randomized experiments. Randomized experiments have been extensively used to estimate treatment effects in various scientific fields. However, to gain deeper insights, it is essential to estimate distributional treatment effects rather than relying solely on average effects. Our approach incorporates pre-treatment covariates into a distributional regression framework, utilizing machine learning techniques to improve the precision of distributional treatment effect estimators. The proposed approach can be readily implemented with off-the-shelf machine learning methods and remains valid as long as the nuisance components are reasonably well estimated. Also, we establish the asymptotic properties of the proposed estimator and present a uniformly valid inference method. Through simulation results and real data analysis, we demonstrate the effectiveness of integrating machine learning techniques in reducing the variance of distributional treatment effect estimators in finite samples.

distributional treatment effect, estimator, experiment, (13 more...)

2407.16037

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Wetscher, Mattias, Seiler, Johannes, Stauffer, Reto, Umlauf, Nikolaus

Stagewise Boosting Distributional Regression

arXiv.org Machine LearningMay-28-2024

Forward stagewise regression is a simple algorithm that can be used to estimate regularized models. The updating rule adds a small constant to a regression coefficient in each iteration, such that the underlying optimization problem is solved slowly with small improvements. This is similar to gradient boosting, with the essential difference that the step size is determined by the product of the gradient and a step length parameter in the latter algorithm. One often overlooked challenge in gradient boosting for distributional regression is the issue of a vanishing small gradient, which practically halts the algorithm's progress. We show that gradient boosting in this case oftentimes results in suboptimal models, especially for complex problems certain distributional parameters are never updated due to the vanishing gradient. Therefore, we propose a stagewise boosting-type algorithm for distributional regression, combining stagewise regression ideas with gradient boosting. Additionally, we extend it with a novel regularization method, correlation filtering, to provide additional stability when the problem involves a large number of covariates. Furthermore, the algorithm includes best-subset selection for parameters and can be applied to big data problems by leveraging stochastic approximations of the updating steps. Besides the advantage of processing large datasets, the stochastic nature of the approximations can lead to better results, especially for complex distributions, by reducing the risk of being trapped in a local optimum. The performance of our proposed stagewise boosting distributional regression approach is investigated in an extensive simulation study and by estimating a full probabilistic model for lightning counts with data of more than 9.1 million observations and 672 covariates.

algorithm, correlation, gradient, (11 more...)

2405.18288

Country:

Europe > Austria > Vienna (0.14)
Europe > Austria > Tyrol > Innsbruck (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Chhachhi, Saurab, Teng, Fei

On the 1-Wasserstein Distance between Location-Scale Distributions and the Effect of Differential Privacy

arXiv.org Machine LearningApr-28-2023

We provide an exact expressions for the 1-Wasserstein distance between independent location-scale distributions. The expressions are represented using location and scale parameters and special functions such as the standard Gaussian CDF or the Gamma function. Specifically, we find that the 1-Wasserstein distance between independent univariate location-scale distributions is equivalent to the mean of a folded distribution within the same family whose underlying location and scale are equal to the difference of the locations and scales of the original distributions. A new linear upper bound on the 1-Wasserstein distance is presented and the asymptotic bounds of the 1-Wasserstein distance are detailed in the Gaussian case. The effect of differential privacy using the Laplace and Gaussian mechanisms on the 1-Wasserstein distance is studied using the closed-form expressions and bounds.

1-wasserstein distance, artificial intelligence, machine learning, (17 more...)

2304.14869

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Thielmann, Anton, Kruse, René-Marcel, Kneib, Thomas, Säfken, Benjamin

Neural Additive Models for Location Scale and Shape: A Framework for Interpretable Neural Regression Beyond the Mean

arXiv.org Artificial IntelligenceJan-27-2023

Deep neural networks (DNNs) have proven to be highly effective in a variety of tasks, making them the go-to method for problems requiring high-level predictive power. Despite this success, the inner workings of DNNs are often not transparent, making them difficult to interpret or understand. This lack of interpretability has led to increased research on inherently interpretable neural networks in recent years. Models such as Neural Additive Models (NAMs) achieve visual interpretability through the combination of classical statistical methods with DNNs. However, these approaches only concentrate on mean response predictions, leaving out other properties of the response distribution of the underlying data. We propose Neural Additive Models for Location Scale and Shape (NAMLSS), a modelling framework that combines the predictive power of classical deep learning models with the inherent advantages of distributional regression while maintaining the interpretability of additive models.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.11862

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-12-2023

Multimodal Deep Learning

Akkus, Cem, Chu, Luyang, Djakovic, Vladana, Jauch-Walser, Steffen, Koch, Philipp, Loss, Giacomo, Marquardt, Christopher, Moldovan, Marco, Sauter, Nadja, Schneider, Maximilian, Schulte, Rickmer, Urbanczyk, Karol, Goschenhofer, Jann, Heumann, Christian, Hvingelby, Rasmus, Schalk, Daniel, Aßenmacher, Matthias

FIGURE 1: LMU seal (left) style-transferred to Van Gogh's Sunflower painting (center) and blended with the prompt - Van Gogh, sunflowers - via CLIP+VGAN (right). In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multimodal approaches have become a very active area of research. In this seminar, we reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other Chapter 3.1 and Chapter 3.2), as well as models in which one modality is utilized to enhance representation learning for the other (Chapter 3.3 and Chapter 3.4). To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced (Chapter 3.5). Finally, we also cover other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose multi-modal models (Chapter 4.3), which are able to handle different tasks on different modalities within one unified architecture.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2301.04856

Country:

North America > Canada > Ontario > Toronto (0.13)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre:

Summary/Review (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(4 more...)